docs(blog): add engineering post on the code evaluators runtime by wochinge · Pull Request #3072 · langfuse/langfuse-docs

wochinge · 2026-06-08T14:01:40Z

Summary

Adds the engineering blog post "Designing the runtime for Langfuse code evaluators" (/blog/2026-06-12-code-evaluators-execution-model): requirements, the options we ruled out, the Lambda + tenant-isolation design, and how it held up in production.
Replaces the earlier draft ("running customer code safely") based on review feedback: descriptive title, less incident-led, more technical depth, magnitudes instead of exact tenant/run numbers.

Linear

LF-2189

vercel · 2026-06-08T14:01:46Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
langfuse-docs	Ready	Preview, Comment	Jun 22, 2026 7:42am

github-actions · 2026-06-08T14:01:56Z

@claude review

claude

LGTM, straightforward blog post addition.

Extended reasoning...

Overview

This PR adds a single MDX blog post (content/blog/2026-06-08-running-customer-code-safely.mdx) plus two PNG screenshots for redacted GuardDuty findings. No code, components, or build configuration changes.

Security risks

None for this docs repo. The post discusses Langfuse's production security architecture for running untrusted user code, but the content itself is descriptive prose — no secrets, credentials, or sensitive endpoints exposed (GuardDuty screenshots are described as redacted). The author key tobiaswochinger is already present in data/authors.json, and all referenced components (BlogHeader, Video, Frame) are standard MDX components used throughout the blog.

Level of scrutiny

Low. This is a pure content addition to a docs site — no logic, no configuration that affects production, no shared component changes. The Vercel preview build provides immediate visual verification, and the worst-case failure mode is a broken page render that's easy to spot and revert.

Other factors

No bugs were flagged by the bug hunting system. The PR description explicitly notes the intentional omission of ogImage to avoid a missing-asset reference. Prose, links, and Mermaid diagram all look internally consistent.

claude

Docs-only blog post addition, looks good.

Extended reasoning...

Overview

This PR adds a single engineering blog post (content/blog/2026-06-08-running-customer-code-safely.mdx) plus two redacted GuardDuty screenshots under public/images/blog/2026-06-08-running-customer-code-safely/. No functional code, no schema, no config — just an MDX file using already-established components (BlogHeader, globally-registered Frame/Video, mermaid code block).

Security risks

None. The post is content only and renders through the standard MDX pipeline. The screenshots are pre-redacted GuardDuty captures (already vetted as part of authoring). The content discusses Langfuse's own security model — no secrets or credentials in the post itself.

Level of scrutiny

Low. Frontmatter, BlogHeader props, and component usage match existing 2026 posts. Image paths in the diff resolve to the two new files included in the same PR. No build-affecting changes.

Other factors

The only finding from the bug hunter is a nit about omitting the date= prop on BlogHeader. It is purely cosmetic (an empty span renders instead of a visible date string), the bug hunter acknowledges other recent engineering posts also omit it, and it does not block the build or feed indexing — the frontmatter date is still authoritative. Not worth blocking on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ased-evals

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d52abcfc24

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Replace the MDX-breaking {{...}} tokens with magnitude-only figures so the post renders, and add a parallel skeleton draft of the engineering-focused rewrite for side-by-side iteration.

First manual pass over the skeleton: rewritten options section, least-privilege stance, and production-incident recap.

…-blog-post-about-code-based-evals

Drop the original running-customer-code-safely draft in favor of the rewritten engineering post; include redacted GuardDuty screenshot.

Fix grammar, align limits with the docs table (separate 256 KB result cap), straighten apostrophes, and rename the image folder to match the post slug.

hassiebp

Really like this iteration, thanks Tobi! Can you please cycle in @annabellscha to get this ready for website + HackerNews? Marc also wanted to do a loop here.

Explain why in-process isolation is hard to get right (single enforcement layer in a shared process), spell out DSL, and soften the agent-sandbox contrast per review feedback.

claude

LGTM — prior review feedback and bot findings are addressed in the latest revisions.

Extended reasoning...

Overview

This PR adds a single engineering blog post — content/blog/2026-06-12-code-evaluators-execution-model.mdx — plus two GuardDuty screenshots under public/images/blog/2026-06-12-code-evaluators-execution-model/, and removes a stray out/_headers file that was accidentally tracked (the out/ directory is gitignored). No code logic, routing, build config, or component changes.

Security risks

None meaningful. The content is prose plus two screenshots that the author already scrubbed of identifying ASN/org information per Steffen911's earlier request. The out/_headers deletion does not affect the Vercel deployment (the primary deploy target shown in the PR timeline); on Cloudflare static-export deployments it would remove a X-Robots-Tag: noindex header, but out/ is build output that should not be in version control in the first place — cleaning it up is correct.

Level of scrutiny

Low. This is a docs-only change in content/blog/ — a directory whose only consumer is the fumadocs-mdx blog source. The post has been through multiple reviewer rounds (annabellscha, AkioNuernberger, Steffen911, hassiebp) and multiple bot rounds (mine, codex). The latest commits explicitly target review feedback: title rephrased away from incident framing, exact tenant/run numbers genericized, DSL defined, in-process-isolation paragraph expanded with concrete failure modes, size limits now match content/docs/evaluation/evaluation-methods/code-evaluators.mdx:476-478 exactly, image folder renamed to match post slug, smart quotes normalized, pip install→uv add suggestion incorporated. Spot-checking the file confirms no remaining curly quotes (grep for U+2018/19/201C/201D returns nothing) and no unresolved {{…}} placeholders.

Other factors

The PR carries the lgtm label and hassiebp's two outstanding inline asks ("Define what DSL abbreviated for", "Why exactly are they hard to get right?") were both replied to with "done" by the author, with corresponding edits visible in the diff. The bug-hunting system found no bugs in this iteration. Since substantive changes have landed since my prior reviews and all flagged issues are resolved, a final approval is the right action rather than staying silent.

annabellscha

small comments, other than that LGTM

Link error analysis and fold it into the evaluator pipeline sentence, and reword the code-example lead-in per review.

Rename post and image folder to today's date; update frontmatter, header, and image paths to match.

…ased-evals

The Cloudflare noindex headers file was deleted from the working tree locally and swept into an earlier blog commit. Unrelated to this PR.

…about-code-based-evals' into feature/lf-2189-write-blog-post-about-code-based-evals

claude

Blog post content with no code changes; previously-flagged issues (placeholders, skeleton file, image folder slug) are all resolved in the current diff and the PR carries the lgtm label after extensive human review.

Extended reasoning...

Overview

This PR adds a single engineering blog post (content/blog/2026-06-22-code-evaluators-execution-model.mdx, 209 lines) plus two GuardDuty screenshots under public/images/blog/2026-06-22-code-evaluators-execution-model/. No application code, no config, no schema, no build/CI changes. The post describes the runtime design for Langfuse code evaluators (AWS Lambda + tenant isolation).

Security risks

None to the application. The post discusses security architecture, but it is prose and images only — there is no code path being changed. I cross-checked that the previously-flagged content concerns from earlier iterations of this PR have been addressed: the {{TENANT_COUNT}} / {{MONTHLY_RUNS}} / {{LAMBDA_COUNT}} placeholders are gone; the SKELETON draft file is gone; numbers have been genericized per reviewer feedback ("thousands of teams", "hundreds of millions of observations per day"); the image folder slug now matches the post slug; the size limits on line 175 now match the canonical docs (256 KB / 5.5 MB / 256 KB split, decimal units); the ASN screenshots that reviewers asked to redact are no longer referenced.

Level of scrutiny

Low. This is a content-only PR for the public blog. No code paths are affected. The fumadocs-mdx auto-discovery means the post will be published as /blog/2026-06-22-code-evaluators-execution-model, which is the intended behavior, and the only failure mode (unresolved MDX expressions) has been verified clean — a quick scan of the file shows no {{...}} placeholders remain and no undefined identifiers in MDX expression positions.

Other factors

The PR already carries the lgtm label after substantive review by multiple humans (hassiebp, Steffen911, AkioNuernberger, annabellscha). I have left several reviews across prior iterations of this PR and have nothing new to add — every concern I previously raised is resolved in the current state, and the bug hunting system found no new issues. Approving rather than staying silent because the prior reviews were on different content (the now-deleted 2026-06-08/2026-06-10/2026-06-12 versions), so a fresh LGTM on the renamed/finalized file is meaningful signal rather than noise.

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. docs labels Jun 8, 2026

vercel Bot deployed to Preview June 8, 2026 14:04 View deployment

claude Bot reviewed Jun 8, 2026

View reviewed changes

wochinge requested a review from annabellscha June 8, 2026 14:34

vercel Bot deployed to Preview June 8, 2026 14:36 View deployment

claude Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

annabellscha reviewed Jun 8, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

AkioNuernberger reviewed Jun 8, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

docs(blog): add post on running untrusted code for code evaluators

1b1b216

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wochinge force-pushed the feature/lf-2189-write-blog-post-about-code-based-evals branch from 76afd47 to 1b1b216 Compare June 8, 2026 21:47

vercel Bot had a problem deploying to Preview June 8, 2026 21:49 Failure

Steffen911 reviewed Jun 9, 2026

View reviewed changes

hassiebp reviewed Jun 9, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

hassiebp reviewed Jun 9, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

hassiebp reviewed Jun 9, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

hassiebp reviewed Jun 9, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

Merge branch 'main' into feature/lf-2189-write-blog-post-about-code-b…

d52abcf

…ased-evals

vercel Bot had a problem deploying to Preview June 9, 2026 09:02 Failure

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

claude Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread content/blog/2026-06-08-running-customer-code-safely.mdx Outdated

docs(blog): fix unresolved placeholders and add execution-model draft

43915a0

Replace the MDX-breaking {{...}} tokens with magnitude-only figures so the post renders, and add a parallel skeleton draft of the engineering-focused rewrite for side-by-side iteration.

vercel Bot deployed to Preview June 10, 2026 13:38 View deployment

claude Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread content/blog/2026-06-10-code-evaluators-execution-model.mdx Outdated

wochinge added 2 commits June 11, 2026 11:16

docs(blog): draft prose for execution-model post

039aa4e

First manual pass over the skeleton: rewritten options section, least-privilege stance, and production-incident recap.

Merge remote-tracking branch 'origin/main' into feature/lf-2189-write…

9ad51a2

…-blog-post-about-code-based-evals

vercel Bot deployed to Preview June 11, 2026 12:20 View deployment

wochinge requested a review from Steffen911 June 11, 2026 12:23

claude Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread content/blog/2026-06-22-code-evaluators-execution-model.mdx

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

docs(blog): replace draft with execution-model post, set date to June 12

7d1822d

Drop the original running-customer-code-safely draft in favor of the rewritten engineering post; include redacted GuardDuty screenshot.

wochinge force-pushed the feature/lf-2189-write-blog-post-about-code-based-evals branch from 28c6a23 to 7d1822d Compare June 11, 2026 12:26

vercel Bot deployed to Preview June 11, 2026 12:29 View deployment

claude Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

docs(blog): address review comments on execution-model post

4482ad4

Fix grammar, align limits with the docs table (separate 256 KB result cap), straighten apostrophes, and rename the image folder to match the post slug.

vercel Bot deployed to Preview June 11, 2026 12:52 View deployment

hassiebp reviewed Jun 11, 2026

View reviewed changes

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

hassiebp reviewed Jun 11, 2026

View reviewed changes

Comment thread content/blog/2026-06-22-code-evaluators-execution-model.mdx

hassiebp reviewed Jun 11, 2026

View reviewed changes

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

hassiebp self-requested a review June 12, 2026 09:42

hassiebp approved these changes Jun 12, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 12, 2026

docs(blog): address second review round on execution-model post

ba724bc

Explain why in-process isolation is hard to get right (single enforcement layer in a shared process), spell out DSL, and soften the agent-sandbox contrast per review feedback.

vercel Bot deployed to Preview June 12, 2026 14:52 View deployment

claude Bot reviewed Jun 12, 2026

View reviewed changes

annabellscha reviewed Jun 18, 2026

View reviewed changes

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

Comment thread content/blog/2026-06-12-code-evaluators-execution-model.mdx Outdated

wochinge added 5 commits June 22, 2026 09:33

docs(blog): address review feedback on intro section

d6b7788

Link error analysis and fold it into the evaluator pipeline sentence, and reword the code-example lead-in per review.

docs(blog): set execution-model post date to June 22

32ea6b6

Rename post and image folder to today's date; update frontmatter, header, and image paths to match.

Merge branch 'main' into feature/lf-2189-write-blog-post-about-code-b…

d9f7711

…ased-evals

fix(blog): restore accidentally deleted out/_headers

debbdef

The Cloudflare noindex headers file was deleted from the working tree locally and swept into an earlier blog commit. Unrelated to this PR.

Merge remote-tracking branch 'origin/feature/lf-2189-write-blog-post-…

4e87892

…about-code-based-evals' into feature/lf-2189-write-blog-post-about-code-based-evals

vercel Bot deployed to Preview June 22, 2026 07:42 View deployment

claude Bot reviewed Jun 22, 2026

View reviewed changes

wochinge added this pull request to the merge queue Jun 22, 2026

dosubot Bot added the auto-merge This PR is set to be merged label Jun 22, 2026

Merged via the queue into main with commit f080a3a Jun 22, 2026
17 checks passed

wochinge deleted the feature/lf-2189-write-blog-post-about-code-based-evals branch June 22, 2026 12:59

dosubot Bot removed the auto-merge This PR is set to be merged label Jun 22, 2026

Uh oh!

Conversation

wochinge commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linear

Uh oh!

vercel Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hassiebp left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

annabellscha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

wochinge commented Jun 8, 2026 •

edited

Loading

vercel Bot commented Jun 8, 2026 •

edited

Loading